Python Pandas ããããããŒãã«ã§ããŒã¿å圢æããã¹ã¿ãŒããŸããããã°ããŒãã«ããŒã¿åæã®ããã®æ§æãé«åºŠãªãã¯ããã¯ãå®çšçãªäŸã詳ãã解説ããŸãã
Python Pandas ããããããŒãã«ïŒããŒã¿å圢æã®ããã®å æ¬çãªã¬ã€ã
ããŒã¿åæã®äžçã§ã¯ãããŒã¿ãèŠçŽãéèšãåæ§ç¯ããèœåã¯åãªãã¹ãã«ã§ã¯ãªããã¹ãŒããŒãã¯ãŒã§ããçã®ããŒã¿ã¯ããã®ãã€ãã£ããªåœ¢åŒã§ã¯ãå€ãã®å Žåãåºå€§ã§è©³çްãªå°åž³ã«äŒŒãŠããŸããæ
å ±ãè±å¯ã§ãããè§£éãå°é£ã§ããæå³ã®ããæŽå¯ãæœåºããã«ã¯ããã®å°åž³ãç°¡æœãªèŠçŽã«å€æããå¿
èŠããããŸããããã¯ãŸãã«ããããããŒãã«ãåªããŠããç¹ã§ãPythonããã°ã©ããŒã«ãšã£ãŠãPandasã©ã€ãã©ãªã¯åŒ·åã§æè»ãªããŒã«ã§ããpivot_table()ãæäŸããŸãã
ãã®ã¬ã€ãã¯ãããŒã¿ã¢ããªã¹ããç§åŠè ãPythonæå¥œå®¶ã®ã°ããŒãã«ãªèŠèŽè åãã«èšèšãããŠããŸãã Pandas ããããããŒãã«ã®ã¡ã«ããºã ãæ·±ãæãäžããåºæ¬çãªæŠå¿µããé«åºŠãªãã¯ããã¯ãŸã§ãåŠã³ãŸããããŸããŸãªå€§éžããã®å£²äžé«ãèŠçŽããå Žåã§ããå°åã®æ°åããŒã¿ãåæããå Žåã§ãã忣ããŒã ã®ãããžã§ã¯ãã¡ããªãã¯ã远跡ããå Žåã§ããããããããŒãã«ãç¿åŸãããšãããŒã¿ã®æ¢çŽ¢æ¹æ³ãæ ¹æ¬çã«å€ãããŸãã
ããããããŒãã«ãšã¯æ£ç¢ºã«ã¯äœã§ããïŒ
Microsoft ExcelãGoogle Sheetsã®ãããªã¹ãã¬ããã·ãŒããœãããŠã§ã¢ã䜿çšããããšãããå Žåã¯ãããããããŒãã«ã®æŠå¿µã«ç²ŸéããŠããå¯èœæ§ããããŸããããã¯ããã倧ããªããŒã¿ã»ããããéžæããåãšè¡ãåç·šæããŠèŠçŽããç®çã®ã¬ããŒããååŸã§ããã€ã³ã¿ã©ã¯ãã£ããªããŒãã«ã§ãã
ããããããŒãã«ã¯2ã€ã®éèŠãªããšãè¡ããŸãã
- éèšïŒ1ã€ä»¥äžã®ã«ããŽãªã§ã°ã«ãŒãåãããæ°å€ããŒã¿ã®èŠçŽçµ±èšïŒåèšãå¹³åããŸãã¯ã«ãŠã³ããªã©ïŒãèšç®ããŸãã
- å圢æïŒããŒã¿ããé·ãã圢åŒãããåºãã圢åŒã«å€æããŸãããã¹ãŠã®å€ã1ã€ã®åã«æã€ä»£ããã«ãåã®äžæãªå€ãããããããããŠãåºåã®æ°ããåã«å€æããŸãã
Pandas pivot_table()颿°ã¯ããã®åŒ·åãªæ©èœãPythonããŒã¿åæã¯ãŒã¯ãããŒã«çŽæ¥ãããããåçŸå¯èœã§ãã¹ã¯ãªããå¯èœã§ãã¹ã±ãŒã©ãã«ãªããŒã¿å圢æãå¯èœã«ããŸãã
ç°å¢ãšãµã³ãã«ããŒã¿ã®èšå®
å§ããåã«ãPandasã©ã€ãã©ãªãã€ã³ã¹ããŒã«ãããŠããããšã確èªããŠãã ãããããã§ãªãå Žåã¯ãPythonã®ããã±ãŒãžã€ã³ã¹ããŒã©ãŒã§ããpipã䜿çšããŠã€ã³ã¹ããŒã«ã§ããŸãã
pip install pandas
次ã«ãPythonã¹ã¯ãªãããŸãã¯ããŒãããã¯ã«ã€ã³ããŒãããŸãããã
import pandas as pd
import numpy as np
ã°ããŒãã«ã»ãŒã«ã¹ããŒã¿ã»ããã®äœæ
äŸãå®è·µçãã€ã°ããŒãã«ã«é¢é£æ§ã®é«ããã®ã«ããããã«ãå€åœç±eã³ããŒã¹äŒæ¥ã®è²©å£²ããŒã¿ã衚ãåæããŒã¿ã»ãããäœæããŸãããã®ããŒã¿ã»ããã«ã¯ãããŸããŸãªå°åãåœã補åã«ããŽãªããã®è²©å£²ã«é¢ããæ å ±ãå«ãŸããŸãã
# Create a dictionary of data
data = {
'TransactionID': range(1, 21),
'Date': pd.to_datetime([
'2023-01-15', '2023-01-16', '2023-01-17', '2023-02-10', '2023-02-11',
'2023-02-12', '2023-03-05', '2023-03-06', '2023-03-07', '2023-01-20',
'2023-01-21', '2023-02-15', '2023-02-16', '2023-03-10', '2023-03-11',
'2023-01-18', '2023-02-20', '2023-03-22', '2023-01-25', '2023-02-28'
]),
'Region': [
'North America', 'Europe', 'Asia', 'North America', 'Europe', 'Asia', 'North America', 'Europe', 'Asia', 'Europe',
'Asia', 'North America', 'Europe', 'Asia', 'North America', 'Asia', 'Europe', 'North America', 'Europe', 'Asia'
],
'Country': [
'USA', 'Germany', 'Japan', 'Canada', 'France', 'India', 'USA', 'UK', 'China', 'Germany',
'Japan', 'USA', 'France', 'India', 'Canada', 'China', 'UK', 'USA', 'Germany', 'India'
],
'Product_Category': [
'Electronics', 'Apparel', 'Electronics', 'Books', 'Apparel', 'Electronics', 'Books', 'Electronics', 'Apparel',
'Apparel', 'Books', 'Electronics', 'Books', 'Apparel', 'Electronics', 'Books', 'Apparel', 'Books', 'Electronics', 'Electronics'
],
'Units_Sold': [10, 5, 8, 20, 7, 12, 15, 9, 25, 6, 30, 11, 18, 22, 14, 28, 4, 16, 13, 10],
'Unit_Price': [1200, 50, 900, 15, 60, 1100, 18, 950, 45, 55, 12, 1300, 20, 40, 1250, 14, 65, 16, 1150, 1050]
}
# Create DataFrame
df = pd.DataFrame(data)
# Calculate Revenue
df['Revenue'] = df['Units_Sold'] * df['Unit_Price']
# Display the first few rows of the DataFrame
print(df.head())
ãã®ããŒã¿ã»ããã¯ãã«ããŽãªããŒã¿ïŒå°åãåœã補åã«ããŽãªïŒãæ°å€ããŒã¿ïŒUnits_SoldãRevenueïŒãããã³æç³»åããŒã¿ïŒDateïŒã®çµã¿åããã«ããã匷åºãªåºç€ãæäŸããŸãã
pivot_table()ã®æ§é
Pandas pivot_table()颿°ã¯éåžžã«çšéãåºãã§ããæãéèŠãªãã©ã¡ãŒã¿ãŒãåè§£ããŠã¿ãŸãããã
pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, margins_name='All')
- dataïŒããããããDataFrameã
- valuesïŒéèšããããŒã¿ãå«ãåãæå®ããªãå Žåãæ®ãã®ãã¹ãŠã®æ°å€åã䜿çšãããŸãã
- indexïŒæ°ããããããããŒãã«ã®è¡ã圢æããäžæã®å€ãæã€åãããã¯ããã°ã«ãŒãã³ã°ããŒããšåŒã°ããããšããããŸãã
- columnsïŒæ°ããããŒãã«ã®åã圢æããããã«ãããããããããäžæã®å€ãæã€åã
- aggfuncïŒãvaluesãã«é©çšããéèšé¢æ°ãããã¯ããsumãããmeanãããcountãããminãããmaxããªã©ã®æååããŸãã¯
np.sumã®ãããªé¢æ°ã§ããç°ãªã颿°ãç°ãªãåã«é©çšããã«ã¯ã颿°ã®ãªã¹ããŸãã¯èŸæžãæž¡ãããšãã§ããŸããããã©ã«ãã¯ãmeanãã§ãã - fill_valueïŒããããããŒãã«ã®æ¬ èœããçµæïŒNaNïŒã眮ãæããå€ã
- marginsïŒããŒã«å€ã
Trueã«èšå®ãããšãè¡ãšåã®å°èšã远å ãããŸãïŒå¥åãç·èšïŒã - margins_nameïŒ
margins=Trueã®å Žåã«åèšãå«ãè¡/åã®ååãããã©ã«ãã¯ãAllãã§ãã
æåã®ããããããŒãã«ïŒç°¡åãªäŸ
äžè¬çãªããžãã¹ã®è³ªåããå§ããŸãããããå補åã«ããŽãªã§çæãããç·åçã¯ïŒã
ããã«çããã«ã¯ã次ã®ããšãè¡ãå¿ èŠããããŸãã
- è¡ã«
Product_Categoryã䜿çšããŸãïŒindexïŒã RevenueåãéèšããŸãïŒvaluesïŒã- åèšãéèšé¢æ°ãšããŠäœ¿çšããŸãïŒaggfuncïŒã
# Simple pivot table to see total revenue by product category
category_revenue = pd.pivot_table(df,
values='Revenue',
index='Product_Category',
aggfunc='sum')
print(category_revenue)
åºåïŒ
Revenue
Product_Category
Apparel 1645
Books 1184
Electronics 56850
ããã«ãæç¢ºã§ç°¡æœãªèŠçŽãåŸãããŸããçã®20è¡ã®ãã©ã³ã¶ã¯ã·ã§ã³ãã°ã¯ã質åã«çŽæ¥çãã3è¡ã®ããŒãã«ã«å圢æãããŸããããããããããããŒãã«ã®åºæ¬çãªåã§ãã
åãã£ã¡ã³ã·ã§ã³ã®è¿œå
次ã«ããããæ¡åŒµããŸãããã補åã«ããŽãªããšã®ç·åçã確èªãããããå°åå¥ã«ãåå²ãããå Žåã¯ã©ãã§ããããïŒããã§ãcolumnsãã©ã¡ãŒã¿ãŒã圹ç«ã¡ãŸãã
# Pivot table with index and columns
revenue_by_category_region = pd.pivot_table(df,
values='Revenue',
index='Product_Category',
columns='Region',
aggfunc='sum')
print(revenue_by_category_region)
åºåïŒ
Region Asia Europe North America Product_Category Apparel 1125.0 625.0 NaN Books 336.0 360.0 488.0 Electronics 13200.0 14550.0 29100.0
ãã®åºåã¯ã¯ããã«è±å¯ã§ãããRegionãåïŒãAsiaãããEuropeãããNorth AmericaãïŒã®äžæã®å€ãæ°ããåã«ããããããŸãããããã§ãããŸããŸãªè£œåã«ããŽãªãå°åå
šäœã§ã©ã®ããã«æ©èœããŠããããç°¡åã«æ¯èŒã§ããŸãããŸããNaNïŒéæ°å€ïŒå€ã衚瀺ãããŸããããã¯ãããŒã¿ã»ããã§ãNorth Americaãã®ãApparelãã®è²©å£²ãèšé²ãããªãã£ãããšã瀺ããŠããŸããããã¯ããèªäœã§è²Žéãªæ
å ±ã§ãïŒ
é«åºŠãªãããããã¯ããã¯
åºæ¬ã¯åŒ·åã§ãããpivot_table()ã®çã®æè»æ§ã¯ããã®é«åºŠãªæ©èœã§æããã«ãªããŸãã
fill_valueã䜿çšããæ¬ æå€ã®åŠç
以åã®ããŒãã«ã®NaNã¯æ£ç¢ºã§ãããã¬ããŒãäœæããããªãèšç®ãè¡ãã«ã¯ããŒããšããŠè¡šç€ºããæ¹ãæãŸããå ŽåããããŸããfill_valueãã©ã¡ãŒã¿ãŒã䜿çšãããšããããç°¡åã«ãªããŸãã
# Using fill_value to replace NaN with 0
revenue_by_category_region_filled = pd.pivot_table(df,
values='Revenue',
index='Product_Category',
columns='Region',
aggfunc='sum',
fill_value=0)
print(revenue_by_category_region_filled)
åºåïŒ
Region Asia Europe North America Product_Category Apparel 1125 625 0 Books 336 360 488 Electronics 13200 14550 29100
ããŒãã«ã¯ãç¹ã«éæè¡çãªèŠèŽè ã«ãšã£ãŠãããã¯ãªãŒã³ã§èªã¿ããããªããŸããã
è€æ°ã®ã€ã³ããã¯ã¹ã®æäœïŒéå±€ã€ã³ããã¯ã¹ïŒ
è¡ã§è€æ°ã®ã«ããŽãªã§ã°ã«ãŒãåããå¿
èŠãããå Žåã¯ã©ãã§ããããïŒããšãã°ãRegionã§è²©å£²ãåå²ããæ¬¡ã«åå°åå
ã®Countryã§åå²ããŠã¿ãŸããããåã®ãªã¹ããindexãã©ã¡ãŒã¿ãŒã«æž¡ãããšãã§ããŸãã
# Multi-level pivot table using a list for the index
multi_index_pivot = pd.pivot_table(df,
values='Revenue',
index=['Region', 'Country'],
aggfunc='sum',
fill_value=0)
print(multi_index_pivot)
åºåïŒ
Revenue
Region Country
Asia China 488
India 1760
Japan 10860
Europe France 1020
Germany 14440
UK 1115
North America Canada 17800
USA 12058
Pandasã¯ãè¡ã«MultiIndexãèªåçã«äœæããŸããããã®éå±€æ§é ã¯ãããŒã¿ã«ããªã«ããŠã³ããå
¥ãåã«ãªã£ãé¢ä¿ã確èªããã®ã«æé©ã§ããcolumnsãã©ã¡ãŒã¿ãŒã«åãããžãã¯ãé©çšããŠãéå±€åãäœæã§ããŸãã
è€æ°ã®éèšé¢æ°ã®äœ¿çš
å Žåã«ãã£ãŠã¯ã1ã€ã®èŠçŽçµ±èšã ãã§ã¯ååã§ã¯ãããŸãããåã°ã«ãŒãã®ç·åçïŒåèšïŒãšå¹³åãã©ã³ã¶ã¯ã·ã§ã³ãµã€ãºïŒå¹³åïŒã®äž¡æ¹ã確èªããããšãã§ããŸãã颿°ã®ãªã¹ããaggfuncã«æž¡ãããšãã§ããŸãã
# Using multiple aggregation functions
multi_agg_pivot = pd.pivot_table(df,
values='Revenue',
index='Region',
aggfunc=['sum', 'mean', 'count'])
print(multi_agg_pivot)
åºåïŒ
sum mean count
Revenue Revenue Revenue
Region
Asia 13108.000000 2184.666667 6
Europe 16575.000000 2762.500000 6
North America 29858.000000 4976.333333 6
ãã®åäžã®ã³ãã³ãã¯ãç·åçããã©ã³ã¶ã¯ã·ã§ã³ãããã®å¹³ååçãããã³åå°åã®ãã©ã³ã¶ã¯ã·ã§ã³æ°ãšããå æ¬çãªèŠçŽãæäŸããŸãã Pandasãã©ã®ããã«éå±€åãäœæããŠåºåãæŽçããŠãããã«æ³šç®ããŠãã ããã
ç°ãªãå€ãç°ãªã颿°ã«é©çšãã
ããã«è©³çްã«ã§ããŸããRevenueã®åèšã衚瀺ãããããUnits_Soldã®å¹³åã衚瀺ããããšããŸããèŸæžãaggfuncã«æž¡ãããšãã§ããŸããããã§ãããŒã¯ååïŒãvaluesãïŒã§ãããå€ã¯ç®çã®éèšé¢æ°ã§ãã
# Different aggregations for different values
dict_agg_pivot = pd.pivot_table(df,
index='Region',
values=['Revenue', 'Units_Sold'],
aggfunc={
'Revenue': 'sum',
'Units_Sold': 'mean'
},
fill_value=0)
print(dict_agg_pivot)
åºåïŒ
Revenue Units_Sold
Region
Asia 13108 17.833333
Europe 16575 8.166667
North America 29858 14.333333
ãã®ã¬ãã«ã®å¶åŸ¡ã¯ãpivot_table()ãæŽç·ŽãããããŒã¿åæã®ããã®æé«ã®ããŒã«ã«ããŠãããã®ã§ãã
marginsã䜿çšããç·èšã®èšç®
ã¬ããŒãäœæã®ããã«ãè¡ãšåã®åèšãæã€ããšã¯äžå¯æ¬ ã§ããããšããããããŸããmargins=TrueåŒæ°ã¯ãããã远å ã®åŽåãªãã§æäŸããŸãã
# Adding totals with margins=True
revenue_with_margins = pd.pivot_table(df,
values='Revenue',
index='Product_Category',
columns='Region',
aggfunc='sum',
fill_value=0,
margins=True,
margins_name='Grand Total') # Custom name for totals
print(revenue_with_margins)
åºåïŒ
Region Asia Europe North America Grand Total Product_Category Apparel 1125 625 0 1750 Books 336 360 488 1184 Electronics 13200 14550 29100 56850 Grand Total 14661 15535 29588 59784
Pandasã¯ãåè¡ã®åèšïŒãã¹ãŠã®å°åã«ããã補åã«ããŽãªããšã®ç·åçïŒãšååã®åèšïŒãã¹ãŠã®ã«ããŽãªã«ãããå°åããšã®ç·åçïŒããã©ã¹å³äžã®ãã¹ãŠã®ããŒã¿ã®ç·èšãèªåçã«èšç®ããŸãã
å®çšçãªãŠãŒã¹ã±ãŒã¹ïŒæéããŒã¹ã®åæ
ããããããŒãã«ã¯éçã«ããŽãªã«éå®ãããŸãããæç³»åããŒã¿ã®åæã«éåžžã«åœ¹ç«ã¡ãŸããåæã®ç·åçãèŠã€ããŸãããã
ãŸãããDateãåããæãæœåºããå¿
èŠããããŸããPandasã§.dtã¢ã¯ã»ãµãŒã䜿çšã§ããŸãã
# Extract month from the Date column
df['Month'] = df['Date'].dt.month_name()
# Pivot to see monthly revenue by product category
monthly_revenue = pd.pivot_table(df,
values='Revenue',
index='Month',
columns='Product_Category',
aggfunc='sum',
fill_value=0)
# Optional: Order the months correctly
month_order = ['January', 'February', 'March']
monthly_revenue = monthly_revenue.reindex(month_order)
print(monthly_revenue)
åºåïŒ
Product_Category Apparel Books Electronics Month January 250 360 23100 February 795 794 24250 March 705 30 9500
ãã®ããŒãã«ã¯ãåã«ããŽãªã®è²©å£²ããã©ãŒãã³ã¹ãæç³»åã§æç¢ºã«ç€ºããåŸåãå£ç¯æ§ããŸãã¯ç°åžžã容æã«ç¹å®ã§ããŸãã
pivot_table() vs. groupby()ïŒéãã¯äœã§ããïŒ
ããã¯ãPandasãåŠç¿ããŠãã人ã«ãšã£ãŠãããã質åã§ãã2ã€ã®é¢æ°ã¯å¯æ¥ã«é¢é£ããŠãããå®éã«ã¯ãpivot_table()ã¯groupby()ã®äžã«æ§ç¯ãããŠããŸãã
groupby()ã¯ãããäžè¬çã§åºæ¬çãªæäœã§ããããã¯ãããã€ãã®åºæºã«åºã¥ããŠããŒã¿ãã°ã«ãŒãåããéèšé¢æ°ãé©çšã§ããããã«ããŸããçµæã¯éåžžãéå±€ã€ã³ããã¯ã¹ãæã€Pandas SeriesãŸãã¯DataFrameã§ããããé·ãã圢åŒã®ãŸãŸã§ããpivot_table()ã¯ãã°ã«ãŒãåãè¡ããããŒã¿ãå圢æããç¹å¥ãªããŒã«ã§ãããã®äž»ãªç®çã¯ãããŒã¿ãé·ã圢åŒãã人éãèªã¿ãããã¯ã€ã圢åŒã«å€æããããšã§ãã
æåã®äŸãgroupby()ã䜿çšããŠåæ€èšããŸãããã
# Same result as our first pivot table, but using groupby
category_revenue_groupby = df.groupby('Product_Category')['Revenue'].sum()
print(category_revenue_groupby)
çµæã¯ãæåã®ããããããŒãã«ã®DataFrameãšæ©èœçã«åçã®Pandas Seriesã§ãããã ãã2çªç®ã®ã°ã«ãŒãã³ã°ããŒïŒãRegionããªã©ïŒãå°å ¥ãããšãéããæããã«ãªããŸãã
# Grouping by two columns
groupby_multi = df.groupby(['Product_Category', 'Region'])['Revenue'].sum()
print(groupby_multi)
åºåïŒMultiIndexã®SeriesïŒïŒ
Product_Category Region
Apparel Asia 1125
Europe 625
Books Asia 336
Europe 360
North America 488
Electronics Asia 13200
Europe 14550
North America 29100
Name: Revenue, dtype: int64
pivot_table(index='Product_Category', columns='Region')ãšåããã¯ã€ãã圢åŒãååŸããã«ã¯ãgroupby()ã®åŸã«unstack()ã䜿çšããå¿
èŠããããŸãã
# Replicating a pivot table with groupby().unstack()
groupby_unstack = df.groupby(['Product_Category', 'Region'])['Revenue'].sum().unstack(fill_value=0)
print(groupby_unstack)
ããã«ãããåãæã€ããããããŒãã«ãšãŸã£ããåãåºåãçæãããŸãããããã£ãŠãpivot_table()ã¯ãäžè¬çãªgroupby().aggregate().unstack()ã¯ãŒã¯ãããŒã®äŸ¿å©ãªã·ã§ãŒãã«ãããšèããããšãã§ããŸãã
ã©ã¡ãã䜿çšããå ŽåïŒ
- 人éãèªã¿ãããã¯ã€ã圢åŒã®åºåãå¿
èŠãšããå Žåã¯ãç¹ã«ã¬ããŒãäœæãŸãã¯ã¯ãã¹éèšã®äœæã«
pivot_table()ã䜿çšããŸãã - ããé«ãæè»æ§ãå¿
èŠãªå ŽåãããŒã¿åŠçãã€ãã©ã€ã³ã§äžéèšç®ãå®è¡ããŠããå ŽåããŸãã¯å圢æãããã¯ã€ã圢åŒãæçµç®æšã§ã¯ãªãå Žåã¯ã
groupby()ã䜿çšããŸãã
ããã©ãŒãã³ã¹ãšãã¹ããã©ã¯ãã£ã¹
pivot_table()ã¯åŒ·åã§ãããç¹ã«å€§ããªããŒã¿ã»ããã§ã¯å¹ççã«äœ¿çšããããšãéèŠã§ãã
- æåã«ãã£ã«ã¿ãªã³ã°ããæ¬¡ã«ããããããŸãïŒããŒã¿ã®ãµãã»ããïŒããšãã°ãéå»1幎éã®è²©å£²ïŒã®ã¿ãåæããå¿ èŠãããå Žåã¯ãããããããŒãã«ãé©çšããåã«DataFrameããã£ã«ã¿ãªã³ã°ããŸããããã«ããã颿°ãåŠçããå¿ èŠã®ããããŒã¿ã®éãåæžãããŸãã
- ã«ããŽãªã¿ã€ãã䜿çšããïŒããããããŒãã«ã§ã€ã³ããã¯ã¹ãŸãã¯åãšããŠé »ç¹ã«äœ¿çšããåïŒãRegionããŸãã¯ãProduct_Categoryããªã©ïŒã«ã€ããŠã¯ãPandasã®ãcategoryãdtypeã«å€æããŸããããã«ãããã¡ã¢ãªäœ¿çšéã倧å¹
ã«åæžãããã°ã«ãŒãåæäœãé«éåãããŸãã
df['Region'] = df['Region'].astype('category') - èªã¿ããããç¶æããïŒã€ã³ããã¯ã¹ãšåãå€ãããããããããŒãã«ã®äœæã¯é¿ããŠãã ãããå¯èœã§ãããæ°çŸåã®å¹ ãšæ°åè¡ã®é·ãã®ããããããŒãã«ã¯ãå ã®çã®ããŒã¿ãšåãããã«èªã¿åãã«ãããªãå¯èœæ§ããããŸããã¿ãŒã²ãããçµã£ãèŠçŽãäœæããããã«äœ¿çšããŸãã
- éèšãçè§£ããïŒ
aggfuncã®éžæã«æ³šæããŠãã ãããäŸ¡æ Œã«ãsumãã䜿çšããããšã¯æå³ããããŸãããããmeanãã®æ¹ãé©ããŠããå ŽåããããŸããåžžã«ãéèšãçããããšããŠãã質åãšäžèŽããŠããããšã確èªããŠãã ããã
çµè«ïŒæŽå¯åã®ããèŠçŽã®ããã®ããŒã«
Pandas pivot_table()颿°ã¯ãããããããŒã¿ã¢ããªã¹ãã®ããŒã«ãããã«äžå¯æ¬ ãªããŒã«ã§ããããã¯ãä¹±éã§è©³çްãªããŒã¿ãããã¯ãªãŒã³ã§æŽå¯åã®ããèŠçŽã«ç§»è¡ããããã®ã宣èšçã§è¡šçŸåè±ãã§åŒ·åãªæ¹æ³ãæäŸããŸãããã®ã³ã¢ã³ã³ããŒãã³ãïŒvaluesãindexãcolumnsãããã³aggfuncïŒãçè§£ããŠç¿åŸãããã«ãã¬ãã«ã€ã³ããã¯ã¹ãã«ã¹ã¿ã éèšãããŒãžã³ãªã©ã®é«åºŠãªæ©èœã掻çšããããšã§ããããæ°è¡ã®Pythonã³ãŒãã§è€éãªããžãã¹ã®è³ªåã«çããããã«ããŒã¿ãå圢æã§ããŸãã
次ã«å€§ããªããŒã¿ã»ããã«çŽé¢ããå Žåã¯ãç¡éã®è¡ãã¹ã¯ããŒã«ããè¡åã«æµæããŠãã ããã代ããã«ãçããå¿ èŠããã質åãšãããããããŒãã«ãããŒã¿å ã«é ãããã¹ããŒãªãŒãæããã«ããããã«ããŒã¿ãã©ã®ããã«å圢æã§ããããèããŠãã ããã幞ããªããããïŒ